Improve HTTP response handling and retry behavior#1342
Improve HTTP response handling and retry behavior#1342MichaelGHSeg wants to merge 34 commits intomasterfrom
Conversation
…d down to actual API
…y-After and better backoff timings
- Send X-Retry-Count header on all requests (0 for first attempt) - Add Basic auth using write key for browser dispatchers - Node publisher prefers OAuth Bearer, falls back to Basic auth Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap Retry-After header at 300 seconds (MAX_RETRY_AFTER_SECONDS) - Remove 413 from retryable status codes (payload won't shrink on retry) - Fix fetch-dispatcher to base64 encode Authorization header - Add test coverage for Authorization header in all dispatchers - Add tests for Retry-After capping behavior - Update test expectations for X-Retry-Count always being sent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove conditional X-Retry-Count header logic in publisher.ts
- Always send X-Retry-Count starting with '0' on first attempt
- Update all test expectations from toBeUndefined() to toBe('0')
- Update T01 test name to reflect header is now sent
This aligns node behavior with browser dispatchers which already
always send the header.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update core and browser backoff defaults: minTimeout 100ms (was 500ms), maxTimeout 60s (was Infinity) - Update node publisher explicit backoff params to match - Update backoff tests to reflect new timing expectations - Aligns with analytics-java backoff timing (100ms base, 1min cap) This provides faster initial retries while preventing excessive delays, improving both responsiveness and resource efficiency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update node default maxRetries from 3 to 1000 - Update browser default maxRetries from 10 to 1000 - Aligns with analytics-java which uses 1000 max flush attempts With the shorter 100ms base backoff (60s max), higher retry limits allow better recovery while still respecting the backoff timing caps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: a4ff990 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1342 +/- ##
==========================================
+ Coverage 91.23% 91.57% +0.33%
==========================================
Files 163 163
Lines 4393 4627 +234
Branches 1055 1133 +78
==========================================
+ Hits 4008 4237 +229
- Misses 385 390 +5
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Updated header expectations in http-integration.test.ts to include Authorization - Updated header expectations in http-client.integration.test.ts to include X-Retry-Count and Authorization - Added maxRetries: 3 to OAuth integration tests that expect exactly 3 retry attempts - Added maxRetries: 0 to timeout test to prevent excessive retries during timeout testing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the browser and node SDK HTTP dispatch/retry logic to align response handling (headers, retryability, backoff timing) with the analytics-java implementation.
Changes:
- Standardizes request headers (adds
AuthorizationandX-Retry-Count) and updates tests accordingly. - Revises retry policy (Retry-After handling + cap, retryable status allow/deny lists, standardized backoff bounds).
- Increases default retry budget (notably node default
maxRetriesto 1000) and adjusts token refresh retry behavior.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/node/src/plugins/segmentio/publisher.ts | New header handling + revised retry semantics including Retry-After support and new success/retry classification. |
| packages/node/src/plugins/segmentio/tests/publisher.test.ts | Expands coverage for new retry semantics, headers, and status handling. |
| packages/node/src/lib/token-manager.ts | Adjusts OAuth retry pacing, switches rate-limit handling to Retry-After, and fixes token validity logic. |
| packages/node/src/lib/tests/token-manager.test.ts | Adds/updates tests for token validity, Retry-After spacing, and refresh scheduling. |
| packages/node/src/app/analytics-node.ts | Raises default maxRetries to 1000. |
| packages/node/src/tests/test-helpers/assert-shape/segment-http-api.ts | Makes header assertions resilient to new required headers. |
| packages/node/src/tests/oauth.integration.test.ts | Updates OAuth integration expectations for Retry-After and retry limits. |
| packages/node/src/tests/http-integration.test.ts | Updates integration snapshots/expectations for auth + retry headers. |
| packages/node/src/tests/http-client.integration.test.ts | Updates expected outbound headers for node HTTP client integration. |
| packages/node/src/tests/emitter.integration.test.ts | Updates emitter integration setup for new retry behavior expectations. |
| packages/core/src/priority-queue/backoff.ts | Changes default backoff bounds (min 100ms, max 60s). |
| packages/core/src/priority-queue/tests/backoff.test.ts | Updates core backoff tests to match new bounds. |
| packages/browser/src/plugins/segmentio/ratelimit-error.ts | Adds flag to mark Retry-After retries as “doesn’t consume retry budget”. |
| packages/browser/src/plugins/segmentio/index.ts | Passes attempt count through to dispatcher and drops non-retryable failures. |
| packages/browser/src/plugins/segmentio/fetch-dispatcher.ts | Adds Basic auth header, supports Retry-After, and refines retryable status handling. |
| packages/browser/src/plugins/segmentio/batched-dispatcher.ts | Adds Basic auth + X-Retry-Count, introduces Retry-After handling, and changes batching behavior. |
| packages/browser/src/plugins/segmentio/tests/retries.test.ts | Updates retry semantics tests and adds header expectations. |
| packages/browser/src/plugins/segmentio/tests/fetch-dispatcher.test.ts | New unit tests for standard dispatcher status/Retry-After/header behavior. |
| packages/browser/src/plugins/segmentio/tests/batched-dispatcher.test.ts | Updates batching snapshots and adds retry/header semantics coverage. |
| packages/browser/src/lib/priority-queue/backoff.ts | Mirrors core backoff default changes for browser PQ backoff. |
| packages/browser/src/lib/priority-queue/tests/backoff.test.ts | Updates browser backoff tests to match new bounds. |
| packages/browser/src/core/mocks/analytics-page-tools.ts | Adds a mock module file (currently appears not wired up). |
| packages/browser/jest.config.js | Adds moduleNameMapper for @segment/analytics-page-tools. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
When buildBatch() splits events due to payload size limits, ensure remaining events are flushed: - On success: schedule flush for remaining buffered events - On retry exhaustion: drop failed batch but continue flushing remaining events This prevents events from being orphaned indefinitely when batches are split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace shared totalAttempts counter with per-flush requestCount + isRetrying flag to fix race condition when pagehide sends concurrent chunks via Promise.all (each chunk now correctly gets X-Retry-Count: 0) - sendBatch takes retryCount parameter instead of using shared mutable state - Guard Authorization header behind if(writeKey) to avoid unnecessary encoding - Schedule flush for remaining events on both success and retry exhaustion paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add MAX_RETRY_AFTER_RETRIES (20) safety cap to prevent infinite retries when server keeps returning Retry-After headers (node publisher and browser batched-dispatcher) - Lower node maxRetries default from 1000 to 10 - Fix convertHeaders TS warning by removing redundant any cast - Fix retryCount metadata stamped on wrong events (batch.forEach instead of buffer.map) - Update tests: X-Retry-Count always sent (0 on first attempt), 413 now non-retryable, 300 treated as success, Retry-After cap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Update priority-queue backoff thresholds for minTimeout 100ms (was 500ms): delay assertions lowered from >1000/2000/3000 to >200/400/800 - Update integration test to use objectContaining for headers since Authorization and X-Retry-Count are now always sent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/browser/src/plugins/segmentio/__tests__/fetch-dispatcher.test.ts
Outdated
Show resolved
Hide resolved
packages/browser/src/plugins/segmentio/__tests__/retries.test.ts
Outdated
Show resolved
Hide resolved
…ap test - Clamp parsed Retry-After to >= 0 in batched-dispatcher and fetch-dispatcher - Clamp clockSkew-adjusted wait time to >= 0 in token-manager - Add T21 test for MAX_RETRY_AFTER_RETRIES safety cap in publisher - Rename T01 test to reflect that header is '0', not absent Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mock was never applied because moduleNameMapper in jest.config.js points to the real implementation, taking precedence over __mocks__. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…patchers Extract parseRetryAfter and getStatusBehavior helpers into shared-dispatcher, replacing duplicated hardcoded status code logic in both fetch-dispatcher and batched-dispatcher. Status behavior is now driven by httpConfig (default4xxBehavior, default5xxBehavior, statusCodeOverrides) from CDN settings, with sensible built-in defaults. Also wires httpConfig through settings.ts and the segmentio plugin index so CDN-provided HTTP configuration reaches the dispatchers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename rateLimitConfig.maxTotalBackoffDuration to maxRateLimitDuration (default 180s). Add computeBackoff() helper for exponential backoff with jitter. Wire backoff timing, cumulative backoff duration cap, and cumulative rate-limit duration cap into batched-dispatcher retry logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Node (packages/node): - Remove 408/503 from Retry-After eligibility (only 429 uses Retry-After) - Add rate-limit state to Publisher (rateLimitedUntil, rateLimitStartTime) - 429 with Retry-After: set rate-limit state, requeue batch, halt flush - 429 without Retry-After: counted backoff - Add maxTotalBackoffDuration / maxRateLimitDuration config (default 43200s) - Success clears rate-limit state - Add tests: T04, T19, T20 Browser (packages/browser): - Remove 408/503 from RETRY_AFTER_STATUSES (only 429) - Fix maxRateLimitDuration default from 180s to 43200s per SDD - Fix 429 pipeline blocking: isRetrying flag prevents new dispatches from bypassing scheduled retry - Add tests: T04, T19, T20; update 408/503 tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 29 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
packages/browser/src/plugins/segmentio/__tests__/fetch-dispatcher.test.ts
Outdated
Show resolved
Hide resolved
Refactor Node publisher to use sleep-and-retry for 429 with Retry-After instead of resolve-and-return, aligning retry behavior with the Java SDK. Fix maxRetries default doc from 3 to 10 in settings. Add JSDoc clarifications for browser dispatcher rate-limit config fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add comment explaining 100-399 success range per SDD spec - Rename misleading test names to match actual assertions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- fetch-dispatcher: expect X-Retry-Count to be '0' on first call (the dispatcher always sends the header, defaulting via ?? 0) - publisher: 511 without a token manager is non-retryable (like 501/505), so only 1 attempt should be made, not M+1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…to response-status-updates
Add analytics.on('error') handler to capture delivery failures.
Reports success=false with the first error reason when any batch
fails (non-retryable error or retries exhausted).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Same change as the node e2e-cli: add analytics.on('error') handler
to capture delivery failures and report success=false.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nitoring
The browser SDK's Segment.io plugin handles retries internally via
closure-scoped buffers and swallows all errors (never fires delivery_failure).
This made the previous approach of using analytics.on('error') and a fixed
3-second delay unreliable for retry testing.
Changes:
- Install a fetch monitor before SDK import to track API request activity
- Replace fixed delay(3000) with waitForDelivery() that settles based on
actual HTTP activity (3s after success, 6.5s after error to account for
the SDK's Math.random() * 5000 flush scheduling)
- Use fetch response statuses for error detection instead of analytics
error events
Results: 37/41 retry tests pass. 4 remaining failures are due to browser
SDK architectural limitations (maxAttempts hardcoded to 10, random flush
scheduling dominating backoff timing).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Emit delivery_failure errors when events are dropped (maxAttempts exhausted or NonRetryableError), matching Node SDK pattern - Wire httpConfig.backoffConfig.maxRetryCount to PriorityQueue when explicitly set, preserving retryQueue:false default behavior - Fix batched-dispatcher double-scheme URL bug (https://https://...) - Reduce schedule-flush jitter from 0-5000ms to 100-600ms so exponential backoff is the dominant retry delay - Update e2e-cli: error listener, httpConfig wiring, settle tuning Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The batched-dispatcher double-scheme bug is fixed in source, so the patch no longer applies. run-tests.sh handles stale patch references gracefully via --check guard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
This PR improves HTTP response handling and retry behavior across browser and node SDKs to align with analytics-java implementation.
Key Changes
HTTP Headers:
X-Retry-Countheader starting with '0' on first attemptAuthorizationheaders with Basic auth (browser) and OAuth fallback (node)Retry Behavior:
Retry-Afterheader values at 300 seconds maximummaxRetriesto 1000 (was 3 for node, 10 for browser)Commits
Test Coverage
All tests passing: ✅
🤖 Generated with Claude Code